language model - NISHIO Hirokazu's Scrapbox (Auto-translated from Japanese)

language model

As of 2023, the power of the LLM is blowing all of the following stories out of the water

count base

trigram N-gram

so-so

zero frequency problem cannot be solved

Kneser-Ney smoother

power distribution can also be reproduced.

RNN language model #RNNLM

BPTT is prone to gradient loss.

→LSTM

---

This page is auto-translated from /nishio/言語モデル using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.